Overview

Dataset statistics

Number of variables10
Number of observations391
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory30.7 KiB
Average record size in memory80.3 B

Variable types

Numeric9
Categorical1

Warnings

Pregnancies is highly correlated with AgeHigh correlation
Glucose is highly correlated with InsulinHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
Glucose is highly correlated with InsulinHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Age is highly correlated with Outcome and 1 other fieldsHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
Outcome is highly correlated with Age and 1 other fieldsHigh correlation
Insulin is highly correlated with GlucoseHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Glucose is highly correlated with Outcome and 1 other fieldsHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
df_index has unique values Unique
Pregnancies has 56 (14.3%) zeros Zeros

Reproduction

Analysis started2021-09-01 01:15:17.265999
Analysis finished2021-09-01 01:15:26.239070
Duration8.97 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct391
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean387.5473146
Minimum3
Maximum765
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:26.329720image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile46.5
Q1203.5
median385
Q3567.5
95-th percentile722.5
Maximum765
Range762
Interquartile range (IQR)364

Descriptive statistics

Standard deviation216.102545
Coefficient of variation (CV)0.5576159009
Kurtosis-1.150114152
Mean387.5473146
Median Absolute Deviation (MAD)182
Skewness-0.02494131874
Sum151531
Variance46700.30994
MonotonicityStrictly increasing
2021-08-31T20:15:26.467008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5111
 
0.3%
1771
 
0.3%
1531
 
0.3%
1561
 
0.3%
1571
 
0.3%
1581
 
0.3%
1591
 
0.3%
6721
 
0.3%
1611
 
0.3%
1621
 
0.3%
Other values (381)381
97.4%
ValueCountFrequency (%)
31
0.3%
41
0.3%
61
0.3%
81
0.3%
131
0.3%
141
0.3%
161
0.3%
181
0.3%
191
0.3%
201
0.3%
ValueCountFrequency (%)
7651
0.3%
7631
0.3%
7601
0.3%
7551
0.3%
7531
0.3%
7511
0.3%
7481
0.3%
7471
0.3%
7451
0.3%
7441
0.3%

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.306905371
Minimum0
Maximum17
Zeros56
Zeros (%)14.3%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:26.588774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q35
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.213421914
Coefficient of variation (CV)0.9717308341
Kurtosis1.476228718
Mean3.306905371
Median Absolute Deviation (MAD)1
Skewness1.331966876
Sum1293
Variance10.3260804
MonotonicityNot monotonic
2021-08-31T20:15:26.677766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
192
23.5%
264
16.4%
056
14.3%
345
11.5%
427
 
6.9%
521
 
5.4%
720
 
5.1%
619
 
4.9%
814
 
3.6%
911
 
2.8%
Other values (7)22
 
5.6%
ValueCountFrequency (%)
056
14.3%
192
23.5%
264
16.4%
345
11.5%
427
 
6.9%
521
 
5.4%
619
 
4.9%
720
 
5.1%
814
 
3.6%
911
 
2.8%
ValueCountFrequency (%)
171
 
0.3%
151
 
0.3%
141
 
0.3%
133
 
0.8%
125
 
1.3%
115
 
1.3%
106
 
1.5%
911
2.8%
814
3.6%
720
5.1%

Glucose
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct117
Distinct (%)29.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean122.6956522
Minimum56
Maximum198
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:26.789516image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum56
5-th percentile81
Q199
median119
Q3143
95-th percentile181
Maximum198
Range142
Interquartile range (IQR)44

Descriptive statistics

Standard deviation30.87081364
Coefficient of variation (CV)0.2516047887
Kurtosis-0.4860331345
Mean122.6956522
Median Absolute Deviation (MAD)21
Skewness0.5136803871
Sum47974
Variance953.0071349
MonotonicityNot monotonic
2021-08-31T20:15:26.911556image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10014
 
3.6%
9910
 
2.6%
1299
 
2.3%
958
 
2.0%
888
 
2.0%
1287
 
1.8%
1267
 
1.8%
1097
 
1.8%
1177
 
1.8%
846
 
1.5%
Other values (107)308
78.8%
ValueCountFrequency (%)
561
 
0.3%
683
0.8%
712
 
0.5%
743
0.8%
751
 
0.3%
772
 
0.5%
782
 
0.5%
792
 
0.5%
802
 
0.5%
815
1.3%
ValueCountFrequency (%)
1981
 
0.3%
1972
0.5%
1962
0.5%
1951
 
0.3%
1931
 
0.3%
1911
 
0.3%
1892
0.5%
1881
 
0.3%
1874
1.0%
1861
 
0.3%

BloodPressure
Real number (ℝ≥0)

Distinct37
Distinct (%)9.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.68030691
Minimum24
Maximum110
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:27.038332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile50
Q162
median70
Q378
95-th percentile90
Maximum110
Range86
Interquartile range (IQR)16

Descriptive statistics

Standard deviation12.50754012
Coefficient of variation (CV)0.1769593352
Kurtosis0.7915695858
Mean70.68030691
Median Absolute Deviation (MAD)8
Skewness-0.09122008091
Sum27636
Variance156.4385599
MonotonicityNot monotonic
2021-08-31T20:15:27.150897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
7031
 
7.9%
7430
 
7.7%
6426
 
6.6%
6824
 
6.1%
7223
 
5.9%
7823
 
5.9%
7620
 
5.1%
6020
 
5.1%
6219
 
4.9%
5818
 
4.6%
Other values (27)157
40.2%
ValueCountFrequency (%)
241
 
0.3%
302
 
0.5%
381
 
0.3%
401
 
0.3%
443
 
0.8%
462
 
0.5%
483
 
0.8%
5010
2.6%
526
1.5%
548
2.0%
ValueCountFrequency (%)
1102
 
0.5%
1062
 
0.5%
1021
 
0.3%
1002
 
0.5%
981
 
0.3%
942
 
0.5%
921
 
0.3%
9011
2.8%
8815
3.8%
8611
2.8%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct48
Distinct (%)12.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.15089514
Minimum7
Maximum63
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:27.263408image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile13
Q121
median29
Q337
95-th percentile46.5
Maximum63
Range56
Interquartile range (IQR)16

Descriptive statistics

Standard deviation10.52933596
Coefficient of variation (CV)0.3612011197
Kurtosis-0.4641133551
Mean29.15089514
Median Absolute Deviation (MAD)8
Skewness0.2075296064
Sum11398
Variance110.8669159
MonotonicityNot monotonic
2021-08-31T20:15:27.383035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
3220
 
5.1%
3018
 
4.6%
3317
 
4.3%
2316
 
4.1%
1816
 
4.1%
2914
 
3.6%
2614
 
3.6%
2813
 
3.3%
2713
 
3.3%
2512
 
3.1%
Other values (38)238
60.9%
ValueCountFrequency (%)
72
 
0.5%
81
 
0.3%
103
 
0.8%
115
1.3%
126
1.5%
1310
2.6%
146
1.5%
1511
2.8%
165
1.3%
1710
2.6%
ValueCountFrequency (%)
631
 
0.3%
601
 
0.3%
561
 
0.3%
522
 
0.5%
511
 
0.3%
503
0.8%
493
0.8%
484
1.0%
474
1.0%
467
1.8%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct184
Distinct (%)47.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.2327366
Minimum14
Maximum846
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:27.522188image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile42.5
Q176.5
median126
Q3190
95-th percentile397
Maximum846
Range832
Interquartile range (IQR)113.5

Descriptive statistics

Standard deviation118.9424319
Coefficient of variation (CV)0.7613156788
Kurtosis6.33576778
Mean156.2327366
Median Absolute Deviation (MAD)54
Skewness2.161212146
Sum61087
Variance14147.30211
MonotonicityNot monotonic
2021-08-31T20:15:27.645000image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10511
 
2.8%
1409
 
2.3%
1309
 
2.3%
1208
 
2.0%
947
 
1.8%
1807
 
1.8%
1007
 
1.8%
1156
 
1.5%
1106
 
1.5%
1356
 
1.5%
Other values (174)315
80.6%
ValueCountFrequency (%)
141
 
0.3%
151
 
0.3%
161
 
0.3%
182
0.5%
221
 
0.3%
231
 
0.3%
251
 
0.3%
291
 
0.3%
321
 
0.3%
363
0.8%
ValueCountFrequency (%)
8461
0.3%
7441
0.3%
6801
0.3%
6001
0.3%
5791
0.3%
5451
0.3%
5431
0.3%
5401
0.3%
5101
0.3%
4952
0.5%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct194
Distinct (%)49.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean307.9565217
Minimum24
Maximum671
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:27.939255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile33
Q1262.5
median328
Q3368.5
95-th percentile449
Maximum671
Range647
Interquartile range (IQR)106

Descriptive statistics

Standard deviation105.992238
Coefficient of variation (CV)0.3441792283
Kurtosis1.79883416
Mean307.9565217
Median Absolute Deviation (MAD)50
Skewness-0.9517253983
Sum120411
Variance11234.35452
MonotonicityNot monotonic
2021-08-31T20:15:28.071805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3337
 
1.8%
327
 
1.8%
3166
 
1.5%
3365
 
1.3%
2875
 
1.3%
2595
 
1.3%
3455
 
1.3%
3555
 
1.3%
2525
 
1.3%
3945
 
1.3%
Other values (184)336
85.9%
ValueCountFrequency (%)
243
0.8%
251
 
0.3%
262
 
0.5%
281
 
0.3%
292
 
0.5%
303
0.8%
311
 
0.3%
327
1.8%
344
1.0%
352
 
0.5%
ValueCountFrequency (%)
6711
0.3%
5941
0.3%
5731
0.3%
5321
0.3%
5231
0.3%
4971
0.3%
4791
0.3%
4681
0.3%
4671
0.3%
4651
0.3%

DiabetesPedigreeFunction
Real number (ℝ≥0)

Distinct329
Distinct (%)84.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean481.1534527
Minimum4
Maximum2329
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:28.205147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum4
5-th percentile44.5
Q1245.5
median422
Q3669.5
95-th percentile1086
Maximum2329
Range2325
Interquartile range (IQR)424

Descriptive statistics

Standard deviation340.6676582
Coefficient of variation (CV)0.7080228901
Kurtosis4.941573707
Mean481.1534527
Median Absolute Deviation (MAD)199
Skewness1.596740552
Sum188131
Variance116054.4533
MonotonicityNot monotonic
2021-08-31T20:15:28.332736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6924
 
1.0%
2613
 
0.8%
4523
 
0.8%
4963
 
0.8%
263
 
0.8%
4223
 
0.8%
2993
 
0.8%
6873
 
0.8%
1282
 
0.5%
4122
 
0.5%
Other values (319)362
92.6%
ValueCountFrequency (%)
41
 
0.3%
61
 
0.3%
141
 
0.3%
152
0.5%
162
0.5%
231
 
0.3%
241
 
0.3%
263
0.8%
271
 
0.3%
282
0.5%
ValueCountFrequency (%)
23291
0.3%
22881
0.3%
21371
0.3%
16991
0.3%
13911
0.3%
13531
0.3%
13211
0.3%
13181
0.3%
12921
0.3%
12681
0.3%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct43
Distinct (%)11.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean30.89002558
Minimum21
Maximum81
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.2 KiB
2021-08-31T20:15:28.456457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q123
median27
Q336
95-th percentile52.5
Maximum81
Range60
Interquartile range (IQR)13

Descriptive statistics

Standard deviation10.20159252
Coefficient of variation (CV)0.3302552307
Kurtosis1.73201516
Mean30.89002558
Median Absolute Deviation (MAD)5
Skewness1.401759213
Sum12078
Variance104.07249
MonotonicityNot monotonic
2021-08-31T20:15:28.580802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=43)
ValueCountFrequency (%)
2243
 
11.0%
2132
 
8.2%
2431
 
7.9%
2530
 
7.7%
2328
 
7.2%
2624
 
6.1%
2821
 
5.4%
2714
 
3.6%
2914
 
3.6%
3112
 
3.1%
Other values (33)142
36.3%
ValueCountFrequency (%)
2132
8.2%
2243
11.0%
2328
7.2%
2431
7.9%
2530
7.7%
2624
6.1%
2714
 
3.6%
2821
5.4%
2914
 
3.6%
3010
 
2.6%
ValueCountFrequency (%)
811
 
0.3%
631
 
0.3%
611
 
0.3%
602
0.5%
591
 
0.3%
584
1.0%
572
0.5%
561
 
0.3%
552
0.5%
542
0.5%

Outcome
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size3.2 KiB
0
261 
1
130 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters391
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
0261
66.8%
1130
33.2%

Length

2021-08-31T20:15:28.806345image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-08-31T20:15:28.878451image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0261
66.8%
1130
33.2%

Most occurring characters

ValueCountFrequency (%)
0261
66.8%
1130
33.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number391
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0261
66.8%
1130
33.2%

Most occurring scripts

ValueCountFrequency (%)
Common391
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0261
66.8%
1130
33.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII391
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0261
66.8%
1130
33.2%

Interactions

2021-08-31T20:15:17.590227image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:17.716816image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:17.820369image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:17.927672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.032867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.129306image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.376974image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.487199image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.589935image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.690609image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.791206image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.888335image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:18.987208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.087817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.184253image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.276705image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.376860image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.468231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.562088image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.660095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.759807image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.856844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:19.947298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.042138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.132010image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.227670image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.318097image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.412205image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.502624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.589158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.673771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.753062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:20.859869image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.041527image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.149787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.238951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.324347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.415996image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.511276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.623878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.738603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.863522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:21.975455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.073432image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.169583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.295103image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.445196image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.579802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.707389image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.804062image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.888525image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:22.966341image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.052482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.134391image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.217457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.318866image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.419724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.519620image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.620903image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.719965image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.816653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:23.930598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.030664image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.141155image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.234987image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.322511image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.411239image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.612452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.711844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.793332image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.880283image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:24.964646image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.051745image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.150624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.245819image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.338988image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.423901image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.512520image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.598388image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.694485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-08-31T20:15:25.780674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-08-31T20:15:28.938882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-31T20:15:29.084940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-31T20:15:29.249330image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-31T20:15:29.397792image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-08-31T20:15:25.944798image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-31T20:15:26.161073image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexPregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
031.089.066.023.094.0281.0167.0210
140.0137.040.035.0168.0431.02288.0331
263.078.050.032.088.031.0248.0261
382.0197.070.045.0543.0305.0158.0531
4131.0189.060.023.0846.0301.0398.0591
5145.0166.072.019.0175.0258.0587.0511
6160.0118.084.047.0230.0458.0551.0311
7181.0103.030.038.083.0433.0183.0330
8191.0115.070.030.096.0346.0529.0321
9203.0126.088.041.0235.0393.0704.0270

Last rows

df_indexPregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
38174413.0153.088.037.0140.0406.01174.0390
38274512.0100.084.033.0105.030.0488.0460
3837471.081.074.041.057.0463.01096.0320
3847483.0187.070.022.0200.0364.0408.0361
3857511.0121.078.039.074.039.0261.0280
3867530.0181.088.044.0510.0433.0222.0261
3877551.0128.088.039.0110.0365.01057.0371
3887602.088.058.026.016.0284.0766.0220
38976310.0101.076.048.0180.0329.0171.0630
3907655.0121.072.023.0112.0262.0245.0300